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Abstract 

Randomization is a fundamental tool used in many theoretical and practical areas of com¬ 
puter science. We study here the role of randomization in the area of submodular function 
maximization. In this area most algorithms are randomized, and in almost all cases the ap¬ 
proximation ratios obtained by current randomized algorithms are superior to the best results 
obtained by known deterministic algorithms. Derandomization of algorithms for general sub¬ 
modular function maximization seems hard since the access to the function is done via a value 
oracle. This makes it hard, for example, to apply standard derandomization techniques such as 
conditional expectations. Therefore, an interesting fundamental problem in this area is whether 
randomization is inherently necessary for obtaining good approximation ratios. 

In this work we give evidence that randomization is not necessary for obtaining good algo¬ 
rithms by presenting a new technique for derandomization of algorithms for submodular function 
maximization. Our high level idea is to maintain explicitly a (small) distribution over the states 
of the algorithm, and carefully update it using marginal values obtained from an extreme point 
solution of a suitable linear formulation. We demonstrate our technique on two recent algorithms 
for unconstrained submodular maximization and for maximizing submodular function subject 
to a cardinality constraint. In particular, for unconstrained submodular maximization we ob¬ 
tain an optimal deterministic 1/2-approximation showing that randomization is unnecessary for 
obtaining optimal results for this setting. 
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1 Introduction 


Randomization is a fundamental tool used in many theoretical and practical areas of computer 
science [21 [32]. It is widely used, for example, in cryptology, coding, and the design of approximation 
and online algorithms. In the area of approximation algorithms randomization is used both to 
improve the running time of algorithms and to obtain improved approximation ratios. However, in 
the majority of cases randomization can be removed to obtain an equivalent deterministic algorithm 
with the same approximation ratio (albeit, in some cases, with a longer running time). One 
central field of combinatorial optimization in which it is unclear whether it is possible to avoid 
randomization is the field of submodular function maximization. 

A set function / : 2^ — > R is submodular if for every A, B e Af: f(A fl B) + f(A U B) < 
f(A) + f{B). An equivalent definition of submodularity, which is perhaps more intuitive, is that of 
diminishing returns: f(A\j{u}) — f(A) > f(BU{u}) — f(B) for every A C B C Af and u ^ B. The 
concept of diminishing returns is widely used in economics, and thus, it should come as no surprise 
that utility functions in economics are often submodular. Additionally, submodular functions are 
ubiquitous in various other disciplines, including combinatorics, optimization, information theory, 
operations research, algorithmic game theory and machine learning. A few well known examples 
of submodular functions from these disciplines include cut functions of graphs and hypergraphs, 
rank functions of matroids, entropy, mutual information, coverage functions and budget additive 
functions. Moreover, submodular maximization problems capture well known combinatorial op¬ 
timization problems such as: Max-Cut [Ml [Ml US [29l [33], Max-DiCut pM [Ml [25], Generalized 
Assignment uni im na [ 22 ] and Max-Facility-Location mmm- 

For a general submodular function the explicit representation of the function might be expo¬ 
nential in the size of its ground set. Hence, it is standard practice to assume that the function is 
accessed via a value oracle returning the value of f(S) when given a set S C Af. The use of a value 
oracle makes it difficult to use the standard derandomization method of conditional expectations 
(see, e.g., m- Moreover, other standard methods, such as using a small sample space, seem to 
fail as well. Indeed, for most scenarios the best current approximation algorithms are randomized, 
while known deterministic algorithms achieve only inferior approximation ratios. One example is 
the problem of unconstrained submodular maximization [T] for which the best (optimal) random¬ 
ized algorithm has an approximation ratio of 1/2, while the approximation ratio of the best known 
deterministic algorithm is only 1/3. Another example is maximizing a monotone submodular func¬ 
tion subject to a matroid constraints EUEUH]. For this problem the best deterministic algorithm 
has an approximation ratio of 1/2, while the best (optimal) randomized algorithm has an improved 
approximation ratio of 1 — 1/e. An interesting fundamental problem is whether randomization is 
inherently necessary for obtaining good approximation ratio in the field of submodular function 
maximization. 

1.1 Our results 

In this paper we give evidence that randomization is not necessary for obtaining good algorithms 
for submodular maximization. We present a new technique for derandomization of algorithms for 
submodular maximization based on the following idea. In a typical randomized algorithm the size 
of the support of the distribution implied by the algorithm is exponential (as otherwise it can often 
be trivially enumerated). We show that in certain cases the size of the distribution can be kept 
small (polynomial) throughout the execution of the algorithm. This is done by formulating the set 
of “good” updates to the distribution in each iteration of the algorithm as a linear formulation, 
and then choosing an update step that corresponds on an (optimal) extreme point of the linear 
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program. We then use the fact that an extreme point for our formulations does not have many 
non-integral variables to control the increase in the size of the support of the distribution. This 
allows us to maintain the distribution explicitly throughout the execution. We are not aware of 
any previous results that obtain derandomization via such an approach, and believe that this idea 
may be applicable for other settings as well. 

We demonstrate our technique on two recent algorithms. The first of them is a randomized 
1/2-approximation algorithm presented by [3] for the problem of unconstrained submodular max¬ 
imization. It is known that the approximation ratio of the last algorithm cannot be improved by 
any polynomial time algorithm m- Using our technique we obtain the following result. In this 
result, and throughout the rest of the paper, we use n to denote the size of the ground set J\f. 

Theorem 1.1. Let f be a submodular function. There exists a deterministic algorithm with 
an approximation ratio of 1/2 for the problem maxgcA/'i/C'S')}- The algorithm makes 0(n 2 ) value 
oracle queries. 

Notice that this result shows that randomization is not necessary for obtaining the best possible 
approximation ratio for the problem. However, this comes at a cost, as the number of oracle queries 
made by our deterministic algorithm is 0(n 2 ), while the randomized algorithm only needs 0(n ) 
oracle queries. 

Our second result is for maximizing a (non-monotone) submodular function subject to a car¬ 
dinality constraint. For this problem the best known randomized algorithm has an approximation 
guarantee of 1/e + 0.004 [5]. This algorithm is based on a combination of two randomized algo¬ 
rithms; one of which is an elegant randomized greedy algorithm that has approximation ratio 1/e 
(on its own). We show that one can obtain an equivalent deterministic algorithm. 

Theorem 1.2. Let f be a submodular function. There exists a deterministic algorithm that has 
an approximation ratio of 1/e for the problem of max|5|< fc {/(5)}. The algorithm makes 0(k 2 n ) 
value oracle queries. 

Again, note that our deterministic algorithm makes 0(k 2 n) oracle queries, while the randomized 
algorithm only needs 0{kn) queries. 

1.2 Previous Results 

The literature on submodular maximization problems is very large, and therefore, we mention below 
only a few of the most relevant works. Randomization is widely used in submodular maximization. 
In particular, many of the recent algorithms use an extension of submodular functions to fractional 
vectors known as the multilinear extension (see, e.g., u m ns eh Ei). Any algorithm using this 
extension must be randomized since the only known way to (approximately) evaluate this extension 
is via random sampling. A few examples of randomized algorithms for submodular maximization 
that do not use the multilinear extension can be found in [14, 3ll4l[20]. 

The first provable approximation algorithms for unconstrained submodular maximization were 
described by Feige et al. m ■ Their best algorithms for the problem achieve a randomized ap¬ 
proximation ratio of 0.4 — o(l), and a deterministic approximation ratio of 1/3 — e (where e is an 
arbitrarily small positive constant). On the negative side, [16] showed that no algorithm has an 
approximation ratio of 1/2+ e for the problem. The randomized approximation ratio was improved 
gradually |23U18| . eventually leading to an optimal linear time 1/2-approximation randomized algo¬ 
rithm given by [4]. However, the deterministic approximation ratio has not been improved since the 
work of m • Interestingly, [Jj described a different 1/3-approximation deterministic algorithm for 
the problem, which led some to conjecture that no deterministic algorithm can do better. Huang 
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and Borodin m strengthened this conjecture by showing that a large class of deterministic al¬ 
gorithms resembling the optimal 1/2-approximation randomized algorithm of [0] cannot achieve 
1/2-approximation for unconstrained submodular maximization. 

The problem of maximizing a (non-monotone) submodular function subject to a matroid inde¬ 
pendence constraint (which generalizes a cardinality constraint) was given a deterministic (1/4 —e)- 
approximation algorithm by Lee et al. m and a randomized 0.309-approximation algorithm by 
Vondrak [34]. Later, the randomized approximation ratio was improved to 0.325 using a simulated 
annealing technique [233, an d then to 1/e — o(l) [T9] via an extension of the continuous greedy 
algorithm of [8]. All the above randomized results are based on the multilinear extension, and thus, 
are quite inefficient. Buchbinder et al. [5| described a simple randomized greedy algorithm designed 
specifically for a cardinality constraint, and used this algorithm to achieve a clean approximation 
ratio of 1 /e with a significantly better time complexity. Additionally, [5]J also showed that 1 /e is not 
the correct approximation ratio for the case of a cardinality constraint by describing an (inefficient) 
polynomial time (1/e + 0.004)-approximation algorithm for it. On the hardness side, [23] showed 
that no polynomial time algorithm can have an approximation ratio better than 0.491 for the 
problem of maximizing a (non-monotone) submodular function subject to a cardinality constraint. 

Recent works consider online and streaming variants of the above problems mu as well as 
faster algorithms [6]. 

2 Additional Notation 

For every set S and element u, we denote the union S U {u} by S + u, and the expression S \ {u} 
by S — u. Given a submodular function / : 2^ — > R, the marginal contribution of u to S is denoted 
by f(u | S) = f(S + u) — f(S). Our algorithms explicitly maintain in each iteration i a (finite) 
distribution T>i over possible states of the algorithm. Each distribution is represented as a multiset 
of tuples {(p,S)}, where S' is a state and p is the probability of this state. Naturally, we require 
all the p- values to be positive and to add up to 1. We denote by \T>, \ the number of tuples in the 
distribution T>i, and by supp(Dj) the set of states represented by these tuples (which is also the set 
of states having a positive probability). 

To simplify the presentation of our algorithms, we allow our distributions to contain multiple 
tuples with the same state. Moreover, there might even be multiple identical tuples (this is why the 
distributions have to be multisets). Whenever this happens, the meaning is that the probability of 
a state S is the sum of the p- values of tuples containing it. An implementation of our algorithm 
can either keep identical state tuples separate, or unify them. Our proofs are independent of such 
details. The pseudocode of our algorithms uses the first option, which requires us to assume the 
following semantic rules regarding multisets: 

• Given multisets A and B of tuples, the multiplicity of a tuple in the union A U B is the sum 
of its multiplicities in the original sets. 

• Given an expression of the from {(p(x), S(x)) \ x G A}, where A is a set and (p(x), S(x)) is 
a tuple which is a function of an element x € A. If there are multiple x values resulting a 
single tuple (p, S ), then we assume that the multiplicity of this tuple is equal to the number 
of such x values in A. 
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3 Unconstrained Submodular Maximization 

In this section we present a deterministic 1/2-approximation algorithm for the problem max{/(S') : 
S C A/"} whose formal description is given as Algorithm |TJ Each state in the distribution maintained 
by our algorithm is a pair (A, A) of sets. Technically, this means that the distribution should be 
a multiset of tuples (p, (A, A)). However, we abuse notation and use tuples of the form (p, A, A) 
instead. Additionally, we write instead of E(x,y)~x>i t° avoid visual cluttering. 


Algorithm 1: Deterministic Unconstrained(/) 


1 Initialize: T>q = {(1,0,A/")}. 

2 Denote the elements of AT by u±, U 2 , ■ ■ ■, u n , in an arbitrary order (recall that n = |AA|). 

3 for i = 1 to n do 

4 V(A, Y) € supp(2? i _ 1 ), let a; (A) = /(A + Ui ) - /(A), b t (Y) = /(A - m) - f(Y). 

5 Find an extreme point solution of the following linear formulation: 

(P) E ^[ziX^OiiX) +w(X,Y)bi(Y)] >2-E Vi _ 1 [z(X,Y)b i (Y)] 

E v ._ 1 [z(X,Y)a i (X)+w(X,Y)b i (Y)] >2-^ Vi _ 1 [w{X,Y)a i {X)\ 
z(X, Y) + w(X, Y) = 1 V(A, Y) € supp(P l _ 1 ) 

z(X, Y),w(X , Y) >0 V(A, Y) G supp(2? i _ 1 ) 

Construct a new distribution: 

Vi <T- {{z(X, Y) ■ Pr®.., [(A, A)], (A + u u Y)) | (A, Y) € supp(P i _i), z( A, Y) > 0} 

U{(w(X,Y)-Prv^KX, Y)],(X,Y- Ui )) | (A, Y) e supp(A-i), w(X, Y) > 0} . 

return arg max (X)Y)esupp(x , n) {/(A)} (equivalent to arg max {X Y)£supp(Vn) {f(Y)}). 


We begin the analysis of Algorithm [Tj with the following simple observations. 

Observation 3.1. The following holds for every iteration 1 < i < n of Algorithm^ 

1. For every state (A, Y) £ supp (Vi), X n {ih+ i,..., u n } = 0, {ui+ 1 ,..., u n } C Y and X n 
Ui, ...,Ui} = Y fl {ui, ... ,Ui}. 


2. The total sum of the probabilities in T>i is 1. and thus, T>i is a valid distribution. 

3. The formulation (P) is feasible. In particular, one feasible solution assigns for every state 
(A, A) £ supp(Pj-i): 


z(X,Y) = 


max{0, a* (A)} 


and 

w(X, A) = l-z(X,Y) = 


max{0, Oi(A)} + max{0,6*(A)} 
max{0, bi(Y)} 


max{0,aj(A)} + max{0,6*(A)} 


(or 1 if the denominator is 0) 


(or 0 if the denominator is 0) 


4- For any extreme point of (P) there are at most 2 + |U~i| non-zero variables. Thus, (Df < 
2 + and \V n \ < 2n + 1. 
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Remark: Item 0] of Observation 13.11 is the only place in the analysis of Algorithm Q] where we use 
the fact that the algorithm finds an extreme point solution of (P). Claim lATl shows that one can in 
fact compute a solution for (P) having at most l + |Dj_i| non-zero variables. Moreover, it is possible 
to compute such a solution in near linear time, implying that implementation of Algorithm |T| does 
not require an LP solver. 

Proof. The proof of the observation is by induction on i. Assume the observation holds for every 
1 < i' < i, and let us prove it for i. Observe that item (JT|) holds for i — 1 either by the induction 
hypothesis or (for i = 1) by the fact that Vq trivially obeys it. Additionally, it is easy to see that 
every state of Vi is obtained from a state of T>i-i by either adding iq to A or removing Ui from Y. 
Hence item (JT]) is maintained. Similarly, item Q holds for i — 1. Since z(X,Y) + w(X,Y) = 1 for 
every state (A, A) E supp(Pj_i), the distribution Vi contains up two tuples resulting from (A, A), 
and these tuples exactly split the probability (A, Y) has in i. Thus, the sum of the probabilities 
in Vi is equal to their sum in Pj_i. 

To see why item ([3]) holds, observe first that by (jT| X C Y \ {iii} for each (A, A) E supp(Pj_i), 
and therefore, by submodularity, 


Oi(A) + bi(Y) = [/(A + Ui ) - /(A)] + [/(A - Ui ) - /(A)] > 0 . 

Next, we prove that the first constraint of (P) is satisfied by the assignment suggested by d3|). 
Proving that the second constraint of (P) is satisfied by the assignment can be done similarly. 

If di{X) = bi(Y) = 0 for some state (A, Y), then (A, Y) does not contribute to either side of the 
first constraint of (P), and we may ignore it. Thus, we may assume that either a*(A) or bi(Y) is 
strictly non-zero for every state. Plugging the assignment suggested by ([3|) under this assumption 
into the first constraint in (P), we get: 


Ep,.! [z( A, Y) • Oi(A) + w( A, Y) ■ \(Y )] 

= E f max{0,fl i (A)}-fl i (A) 

2,1-1 max{0, cij( A)} + max{0, 6*(A)} 
>2 . E f max{0,q i (A)}-6 i (y) 

~ 2,1-1 max{0, ejj(A)} + max{0, bi(Y)} 


max{0, bj(Y)} • bjiY) 
max{0, ai{ A)} + max{0, b-fY)} 

= 2-E Vi _ 1 [z{X,Y)-b i (Y)} . 


The final inequality holds even without the expectation due to the following argument: if either 
cij(A) < 0 or bi(Y) < 0, then the LHS is non-negative while the RHS is non-positive. On the other 
hand, if a t {X) > 0 and bj{Y) > 0 the inequality reduces to a?(A) + b‘f(Y) > 2a,i(X)bi(Y), which 
clearly holds. 

Item dH) follows immediately by the properties of an extreme point. Since (P) has 2 + |Pj-i| 
constraints, an extreme point of (P) has at most 2 + |Pj_i| non-zero variables. Since a single tuple 
is added to Vj for every non-zero variable, the size of Vj is upper bounded by 2 + |Pj_i|. □ 

Let OPT C J\f be the optimal solution for the problem {f(S) : S C A f} that we want to 
approximate, and let OPT(X,Y) be a shorthand for the set ((OPT U A') n A). The following is 
the main lemma we need in order to analyze Algorithm |U 


Lemma 3.2. For any iteration 1 < i < n of Algorithm ^ 


E Vl (f(X) + /(A)] - E Vi _ 1 [f(X) + /(A)] > 2 • [/ (OPT(X, A))] - E„J/(OPT(A, A))]) 
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Proof. Observe that whenever it* ^ OPT: 


Ev^ittOPT&Y))] - E Vi [f(OPT(X,Y))] 

= E Vi _, [z(X, Y ) • (. f(OPT(X , Y)) - f(OPT(X + u^Y)))} 

< [*(*, ' (/(X - «i) - f(Y))] = E Vi _ t [z(X, Y) • b t (Y)} , 

where the inequality follows by submodularity since OPT(X, Y) C Y — Ui- A similar argument can 
be used to show that whenever tq £ OPT : 

lEui-i [f(OPT(X, T))] - E Vi [f(OPT{X, y))] < E^X^T) • ai (X)] . 

The lemma now follows by combing the above observations with the next inequality: 

E Vi [f(X) + f(Y)] - E Vt.AfiX) + f(Y )] 

= Ex) i _ 1 [z(X, Y) ■ (f(X + Ui ) - /(X)) + w(X, Y) • (f(Y - m) - f(Y))] 

= E Vi _, [( z{X , Y) • Oi(X) + w(X, Y) ■ bi(Y ))] 

> 2 • max {Ex^j [z(X, Y) • b i (Y)\,E Di _ 1 [w(X,Y) • 0i (X)]} , 

where the inequality follows by the constraints of (P). □ 

We can now prove the next theorem, which implies Theorem ll.il 

Theorem 3.3. Algorithmic is a approximation algorithm performing 0(?r 2 ) value oracle queries. 

Proof. Adding up Lemma 13.21 over 1 < i < n we get: 

®v n [f(X) + f(Y )] - E Vo [f(X) + f(Y)} > 2 ■ (E Vo [f(OPT(X,Y))) - E Vn [f(OPT(X,Y))}) . 

The single state in the support of T>q is (0,AT)- Hence, E x> 0 [f(OPT(X,Y))\ = f(OPT(0,J\f)) = 
f(OPT). On the other hand, for every state (X n , Y n ) £ supp(P rt ) we have X n = Y n by Ob¬ 
servation 1X11 and thus, OPT(X n ,Y n ) = X n = Y n . Plugging all these observations into the last 
inequality gives: 


2 • E Vn [f(X)} - (/(0) + /(AO) > 2 • (/(OPT) - E Vn [f(X)}) . 

Using an averaging argument and the non-negativity of / we now get: 

org (/(X), > «*[«*)] > lYtm + rn±m > [Em. 

(X,l )esupp(P„) z 4 z 

Observation 13.11 shows that \T>i\ < 2i + 1 for every 1 < i < n. Since Algorithm [T] performs 
2 oracle queries for every state in supp(P,;_i), the number of such queries done during the z-th 
iteration is at most 4 i — 2. Adding up the last bound over all iterations we get a bound 0(n 2 ) on 
the total number of oracle queries made by the algorithm. □ 

4 Cardinality Constraints 

In this section we present a deterministic 1/e-approxinration algorithm for the problem max{/(5) : 
\S\ < k} whose formal description is given as Algorithmic Each state in the distribution maintained 
by our algorithm is a set S. 

We first make the following simple observations. 
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Algorithm 2: Deterministic Cardinality(/, k) 


1 Initialize: Vo = {(1,0)}. 

2 for i = 1 to k do 

3 Let Mi C J\f be & subset of at most k elements maximizing YlueM- I <5)]- 

4 Find an optimal extreme point solution of the following linear formulation: 


(P) max EneM i E s~x> i -i[^(«,5') 

S’)] 

T,ueMi x ( u ’ s ) + *(S) 
x{u , S),£(S) 


/(« I -S')] 

< Vfc • Pr S~Di_i [w 0 *5] 
= 1 
> 0 


Vtt £ Mi 

VS € supp(Pj_i) 

Vu £ Mi, S £ supp(Pj_i) 


5 


Construct a new distribution: 


A {(x(u, 5) • Pr^.j [S], S + u) | u G Mj, ,S 6 supp(Pj_i), x(u, 5) > 0} 
U {(£(S) ■ Pw^iSiS) | S £ supp (Pi-JJiS) > 0} . 

6 Return arg max 5esupp(I , fe) {/(S , A .)}. 


Observation 4.1. The following holds for every iteration i = 1, ..., k of Algorithm 0.' 

1. The assignment x{u, S) = * 1 /k ■ l[u ^ 5] and £(S) = 1 — |Mj \ S\/k for every S £ supp(P,;_i) 
and u £ Mi is a feasible assignment for the formulation (P). 

2. The total sum of the probabilities in V- t is 1, and thus, Vi is a valid distribution. 

3. \Vi\ < k + |Pj_i|. Thus, \Vk\ < k 2 3 + 1. 

Proof. The proof of the observation is by induction on i. Assume the observation holds for every 
1 < i! < i, and let us prove it for i. It is easy to verify that item (P) holds given that Pj_i is a 
valid distribution. To see why item ([2]) holds, observe that the sum of the probabilities in V, is: 


E Pr A-i[5]- 

5esupp(X>i_i) 


E x(u,S) +£(S) 

ueMi 


E Prc i _ 1 [5] = l • 

s , esupp(x> i _i) 


Finally, to prove item ([3]) notice that the number of constraints in (P) at iteration i is at most 

k + | supp(Pj_i)| < k + |_ 1 1. By the properties of extreme point solutions the total number of 

variables that are strictly greater than zero is upper bounded by the number of (tight) constraints. 
Since a single set is added to Vj for every non-zero x{u,S ) or £(S) variable, the size of Vj is also 
upper bounded by k + |Pj_i|. □ 


The next lemma upper bounds the probability of an item to be in a set chosen according to the 
distributions defined by Algorithm [2J 


Lemma 4.2. For every element u £ M and 0 < i < k: 


Pr s~Vi [u fL S] > 
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Proof. The proof of the lemma is by induction on i. The distribution T> q gives a probability 1 to 
the empty set, and thus, Prs^x> 0 [u 0 S] = 1 for every u £ J\f, i.e., the base case i = 0 holds. Next, 
assume the lemma holds for 0 < i — 1, and let us prove it for i. 

For simplicity of notation, let us define x[u,S ) to be 0 for every u ^ Mi. A set S £ supp(D,;) 
contains the element u in two cases: if it is constructed from a set in the support of T>i-\ that 
contains u , or it is constructed by adding u to a set in the support of Xh_i- Using this observation 
we get the bound: 


Pr s~Vi [u S\ 


E Pr ^-i[s]- 

Sesupp(x>j_i) 

U$iS 


E x(u',s) + e(S) 


> E 

Sesupp(x>i_i) 

u£S 


E x(u’,S)+t(S) 

u'&Mi 


E *(“> s ) ' Pr^i-i I s ] 

Sesupp(x>i_i) 


= Pr5~Di_i[« ^ S] - E s ^x, i _ 1 [x(u, 5 1 )] > (1 - l /k) ■ Prs^Vi-Au S}> 



where the second equality holds by the second constraint of ( P ), the second inequality holds by the 
first constraint of ( P ) and the last inequality holds by the induction hypothesis. □ 

The following lemma is an immediate implication of Lemma 2.2 of [5]. For completeness, we 
give an independent proof of it Appendix [Aj 

Lemma 4.3. For any subset T and a distribution T>, 


E5~x>[/(P U S)\ > f(S) ■ minPrs^x>[u 0 S] . 


The next lemma is the last component we need in order to analyze Algorithm [2j 
Lemma 4.4. For any iteration 1 < i < k of Algorithm® 


^vAf(S)} - [/(£)] > 1 /k • E s^.AfiOPT U 5) - f(S)] . 

Proof. One can view the construction of T>i in the following way: the probability of every set 
S £ supp(Pj_i) is split. A fraction of l(S) of this probability is kept for S, and for every u £ Mi a 
fraction of x{u , S ) of this probability is transferred to S' + it. Using this view we get: 

Es-W/OS)] -E s ^ i _ 1 [/(5)] = E Ks-Vi-MvhS) ■ f(u | S)} > i • E %-©*_.[/(« I <5)1 

ueMi ueMi 

>\- E ®s~-D i - 1 [f(u\S)}>yE s ~v i _ 1 [f(OPTUS)-f(S)] , 

uGOPT 

where the first inequality holds since the solution found by Algorithm [2] must be at least as good 
as the feasible solution given by Observation 14.11 the second inequality holds by the definition of 
Mi and the last inequality holds by submodularity. □ 

We can now prove the next theorem, which implies Theorem 11.21 (note that Theorem 11.21 is 
trivial for k = 1). 






Theorem 4.5. For k > 2, the approximation ratio of Algorithm \f^ is at least (1 — ^) k 1 > 1/e, 
and it performs 0(k 2 n ) oracle queries. 

Proof. By combining Lemmata 14.2114.31 and 14.41 we get 


E5~2>J/(S)] 



• /(OPT) - /(S) 


(1) 


Next, we prove by induction that: 


k 


®W/(S)] >-• !-- /(OPT) 


2—1 


For i = 0, this is true since /(0) > 0 = (0/fc) • (1 — 1 /k) 1 • /(OPT). Assume now that the 
claim holds for every i! < i, let us prove it for i > 0. 


%~23 i [/(<S')] > E5~X> i _ 1 [/(5’)] + - • E 


= I 1 - f J ' i [/(-S')] + - ■ 

i—2 


1 


1 - 


1 - 


2 — 1 


/(OPT) - /(5) 


2—1 


/(OPT) 


>,i 4 )'¥ 


i 

1 ~~ k 


f(OPT) + -•(!-- 


2—1 


/(OPT) 




2—1 


•/(OPT) , 


where the first inequality follows by inequality flU), and the second inequality follows by the induc¬ 
tion hypothesis. 

The approximation ratio guaranteed by the theorem follows immediately by plugging k into the 
induction hypothesis. Finally, Observation 14.11 implies I'D,] <ik + 1, and thus, in the i-th iteration 
Algorithm [2] makes at most ?r-supp(X/_i) < n- |Pz-i| < nik oracle queries. Thus, the total number 
of oracle queries in all the iterations is at most 0(k 2 n). □ 


4.1 A Tight Example for Algorithm [2] 

In this section we give an example of a “bad” instance for which Algorithm [2] has an approximation 
ratio of at most e -1 + O(^). Specifically, the optimal solution for the instance we describe has a 
value of at least 1, while Algorithm [2] may produce a set of value at most e -1 + O(^). In the rest 
of this section we assume k is larger than some arbitrary constant t (to be determined later). 

The ground set M of our bad instance is the union of two sets O and Y, both of size k (if one 
wishes to have n > 2 k, it is possible to add an arbitrary number of elements that do not affect the 
objective function). The objective function of the instance is the function /: 2^ —> R + defined as 
follows, 

(- E i 2 ) • 

where g: [0,1] — > [0,1] is a function given by the following formula: 

. f (x — 1) • ln(l — x) for 0 < x < 1 — e -1 , 
g(x) = < . 

le 1 fori — 


m = 


|SnO| 


l - 


\snY\ 


+ 


|5ny|\ e-\snY\ 


k 


+ 


k 2 
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Observe that g(x) is a continuous function. Additionally, we note that, 


g'(x) 


1 + ln(l — x) for 0 < x < 1 — e 1 , 
0 for 1 — e -1 < x < 1 , 


and 

,// x J yri for 0 < x < 1 - e~ l , 

g (x) = < x . 

10 fori — e 1 < x < 1 . 

Observe that g' is always non-negative and g" is always non-positive. Thus, g is a non-decreasing 
continuous concave function with g(0) =0 and g{ 1) = e . 

It is useful to find expressions for the marginal contribution of an element u £ Af to a set S C Af 
given the objective /. Let y = \S fl Y\/k and x = (S' 0 0\/k, then 


f(u \S) = -- + (l-x) \^g(y + V*) - g(y) + 
f(u\S) = y({l-y)- (g{y) + ^ 


k 2 


Observation 4.6. The function f is a submodular function. 


Vw €Y\S 
Vu £ 0\S 


(2) 

(3) 


Proof. The marginal ([2]) of an element u £ Y is a decreasing function of x since g is non-decreasing 
and of y since g is concave. On the other hand, the marginal <[3|) of an element u £ O is independent 
of x and a decreasing function of y since g is non-decreasing. □ 

In Appendix[B]we complete the proof. We show that given the above bad instance, Algorithmic 
may terminate with a distribution over subsets of Y. The value of / for any such set S is at most: 


m=g 


k 


e-\s | i i 

+ + k 


= e -1 + O 


On the other hand, O is a feasible solution and f(0) = 1. Thus, completing the proof. 


5 Conclusions 

In this paper we proposed a new technique for derandomization of algorithms in the area of sub- 
modular function maximization. For unconstrained submodular maximization we showed that ran¬ 
domization is not necessary for obtaining the best possible approximation ratio. For submodular 
maximization with a cardinality constraint we obtained nearly the best known result. 

The main interesting open question is whether algorithms that are based on the multilinear 
extension can be derandomized. In particular, it is interesting whether the continuous greedy 
approach (8j BE] used to obtain optimal results for maximizing a monotone submodular function 
subject to a matroid independence constraint can be derandomized. One possible direction is to 
try to approximate the multiliner extension function deterministically using its special properties. 
Another interesting question is whether the number of oracle calls of the deterministic algorithms 
can be reduces. A possible way to speed up algorithms produced by our method is to keep the size 
of the distributions small by avoiding splitting sets when this results in sets of a too low probability. 
As long as only low probability sets are affected, this should not significantly decrease the quality 
of the output, while reducing the number of oracle queries needed (and speeding up the algorithm). 
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A Omitted Proofs 


The following claim slightly improves item |4] of Observation 13.11 and shows that every iteration of 
Algorithm [T] can be implemented in near linear time. 

Claim A.l. Let ( P) be the formulation in Algorithm [7J There always exists a solution to ( P) with 
at most 1 + non-zero variables. Moreover, this solution can be computed in near linear time. 

Proof. To get the desired solution of ( P ) we need make a few simple manipulations to (P). First, 
we replace the first constraint of ( P ) with an objective function asking to maximize: 

• OiP0 + w(X,Y) ■ bi(Y )] - 2 • Ev^ziX^MY)} , 

Since ( P ) is feasible by Observation 13.11 any optimal solution for the new formulation is a feasible 
solution for (P). We can simplify the new objective of (P) by removing constants and using the 
fact that w(X, Y ) is fully determined by z(X,Y) due to the equality z(X, Y) + w(X, Y) = 1. This 
yields: 

E v ._ 1 [z(X,Y)-(a i (X)-3b i (Y))] . 

Similarly, by exchanging terms, the second constraint of (P) can be replaced with the equivalent 
form: 

Epi.j [z(X, Y) • (bi(Y) - 3 • ai (X))] < E Vi _, [bi{Y) - 2 • ai (X)\ . 

The resulting linear program, for which we need to find an optimal solution, is a variant of the 
fractional knapsack problem of the following form (where the number m of items is | supp(Z)j_i)| < 

E m 

j=1 v 3 ' Z 3 

£ 7=1 s i ' z i - B 

0 < Zj < 1 VI < j < m 

The only change compared to a standard (fractional) knapsack problem is that Vj and Sj may 
have negative values (such values can be interpreted as an option to buy additional knapsack 
space). However, a simple modification of the, so called, density rule can be used to solve the 
problem optimally. First, take to the solution all items with Vj > 0 and Sj < 0. Also, omit all items 
of Vj < 0 and Sj > 0. This leaves us with two types of items: “positive” items having Vj,Sj > 0, 
and “negative” items having Vj,Sj < 0. We sort the positive items in decreasing order of Vj/sj 
(intuitively, the value that we can earn per unit of the knapsack). Similarly, we sort the negative 
items increasingly by Vj /Sj (intuitively, the price we need to pay to buy a unit of the knapsack). The 
algorithm then starts by (fractionally) taking the first positive items until the knapsack becomes 
full (or we are out of positive items). We then continue (fractionally) taking positive items and 
negative items in parallel as long as the value per unit gained by the positive item is at least equal 
to the price per unit paid for the negative item. It is easy to see that this algorithm produces an 
optimal solution for the above fractional knapsack problem, and whenever it terminates there is 
only a single (positive or negative) item taken fractionally. 

To complete the proof of the claim observe that when translating a solution for the fractional 
knapsack problem into a solution for the original formulation ( P ) we get two non-zero variables for 
every item taken fractionally, and one non-zero variable for every other item. □ 

Lemma 14.41 is an immediate implication of Lemma 2.2 of [5j. For completeness, we give an 
independent proof of it. 
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Lemma 14.31 For any subset T and a distribution T>, 

®S~v[f(T U S)} > f(S) ■ min Prg^-p [u 0 S} . 

u£N 

Proof. Let u\, U 2 , ■ ■ ■, u n denote the elements of N sorted by a non-decreasing order of the proba¬ 
bility Prs^x>[w 0 S']. Additionally, let Ai = {u\,U 2 , ■ ■ ■ ,Ui} be the set of the first i elements in this 
order (for every 0 < i < n). Then, 

n 

^v[f(TUS)} = f(T) + ^E s ^v[l[u i eS]-f(u i \TU(A i _ 1 nS))} 

i= 1 

n n 

> f(T ) + E S ^[1 [m G S] • f(ui | T U A. i)] = f(T) + f(m | T U i)] • Pr s ^[^ G S] 

1=1 1=1 

n—1 

= Pr S ~T>[ui 0 S] • f(T) + ^(Pr s ^x>[u; G S] - Pr s ~v[ui+i G S]) ■ f (T U Aj) 

i= 1 

+ Pr s~v[u n G s] • f(M) > Pr 5 ^x>[«i 0 S] ■ f(T) , 

where the first inequality holds by submodularity and the second inequality is based on the fact 
that Pr s~v[ui G S] is a non-increasing function of i. □ 


B Proof of the Tight Example for Algorithm [2] 


In this section we continue the proof of the tight example for Algorithm [2j Let / be the submodular 
function defined in Section H~T1 We analyze a possible execution of the algorithm given this function. 

Let us denote the elements of Y by ui, U 2 , ■ ■ ■, Uk- For notational convenience, given an index 
i > k, we denote by Ui the element Uj having i = j (mod k). Given a value z G [0,1], let us 
characterize a distribution 'D(z) as follows. The support of the distribution T>(z) contains at most 
2k states: 

• For every 1 < i < k, the state Sjf = {uj }*ihas a probability of z — [kz\/k. 

• For every 1 < i < k, the state Sf = has a probability of ] kz + lj /k — 2 . 

It can be verihed that all the above probabilities add up to 1, and thus, T>(z) is a valid distribution. 
Technically the above definition of T>(z) sometimes defines multiple identical states (for example, 
the states Sf are identical when z < 1 /k). Whenever this happens, we formally unify these states 
and give the unified state a probability equal to the sum of the probabilities of the unified states. 
In the rest of the proof we ignore that possibility for simplicity. 

Intuitively, V[z) is a distribution over two types of subsets of Y: cyclically continuous states of 
size 1 + \ kz\ and cyclically continuous states of size \kz\. The distribution is symmetrical in the 
sense that all the cyclically continuous states of a given length have equal probabilities. 


Observation B.l. For every z G [0,1] and element u G AT, 


Pr 

S~V(z) 


[u G 5] 


z if u G Y , 
0 if u G O . 
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Proof. It is clear that elements of O never appear in a set distributed like D(z), thus, we consider 
only an element u £ Y. By symmetry u appears in \kz\ + 1 cyclically continuous states of size 
[kz\ + 1 and [kz\ cyclically continuous states of size \_kz\. Thus, 

Pr [it € S] = {[kz\ + 1) • (z — \kz\/k) + (\_kz\) ■ (\_kz + lj/fc — z) 

S~~'Diz) 

= [kz\ ■ ([kz + lj — [kz\)/k + (z — [kz\/k) = z . □ 

Lemma B.2. For every 0 < i < k, Algorithm 0 may set T>i = T>{zi), where Zi = 1 — (1 — l /k) 1 . 

Proof. We prove the lemma by induction. Notice that T>{zq) puts all the probability on the empty 
set, thus, it is identical to V 0 . This completes the proof of the base case. Next, assume that 
Algorithm [2] choosed T>i-\ = T>(zi- 1 ) for some 1 < i < k, and let us prove that it can end up with 
T>.i = T>(zi). Let us start with analyzing the marginal contributions of the various elements to a 
random set from T>(zi-\). 

Let z' i _ l = \kzi-i\/k, and note that 4_i < < z / i _ 1 + 1 /k. Consider an arbitrary element 

u 0 € O. Every set S € supp(T ) i _i) contains either kz' i _ 1 or kz[_ l + 1 elements of Y, and thus, by 
submodularity, we can lower bound the expected marginal contribution of u a to a random set of 
T>i-i with its marginal contribution to a set containing kz\_ x elements of Y. By dH) we now get: 

| S)] < ^ ^(1 - z\_ i) - (g(4- i) + — 

< ^ ((1 ~4-i) - ( z i-i - 1 ) ln ( 1 ~ z i-i)) = ^(l-^-i)'(l + ln(l-4-i)) • 

On the other hand, consider an element u y € Y. A random set from T>,-i contains u y with 
probability Zj_i, in which case the marginal contribution of u y is 0. Every other set S in the 
distribution contains either kz[_ x or kz\_ x + 1 elements of Y, and thus, by submodularity, we can 
upper bound the expected marginal contribution of u y to such a set with its marginal contribution 
to a set containing kz\_ x + 1 elements of y. By © we now get: 

[/K I s )l = ( 1 ~ z i- 1) • (g( z 'i -i + 2 A) - 9(4-1 + V fc ) + 

> j: ■ (i _ z i—i) ■ (g\ z i -1 + 2 / fc ) + ( 4 ) 

> - • (1 - z\_ 1 - l /k) • ^1 + ln(l - 4-1 — 2 / k ) + > 

where Inequality (|4]) follows by the concavity of g. Using the two inequalities we get, 


| 5)] - E s-Vt-Muo I ^)] 


- r (1 _ ^ ( ln ( x ” mi-Va) + 1 ) ~ ¥ ' ( x+ln(1 “ ^ ~ 2/k) + 1 ) 

2 


-4kV"VT) + kl P 


> 


1 


2e 


+ 


k \ k (1 - f) kI k 2 


1 / 4e A 2 

~ e ■ k \ k + k ) k 2 ~ 


(5) 

( 6 ) 
(7) 
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where Inequality m follows for a large enough i (> 4e) since k > £ by our assumption and 
0 < z' i _ 1 < 1 — e -1 . Inequality ([6|) follows by the inequality ln(l — y) > — which holds for 
y £ [0,1). Finally, Inequality ([7]) and the last inequality both follow by considering a large enough 
t (> 6e) and recalling that k > £. 

Since the last inequality holds for every pair of elements u Q £ O and u y £ Y, it implies that 
the set Mi chosen by the algorithm is exactly Y. Given this observation, the formulation (P) of 
Algorithm [2] in iteration i becomes: 

(P) max Y.u& [x(u, S) ■ f(u | 5)] 

[x(u, S)} < i/fc ■ (1 - Zi-i) \/u £ Y 

EugM; x ( u > s ) + t( s ) = 1 VS 1 € supp(A_i) 

x(u,S),£(S) >0 Vu € Mi, S £ supp(Pj_i) 

We need to show that there exists an optimal extreme point solution for this formulation which 
makes the algorithm set T>i = D(zj). There are two cases to consider. If T>(zi— i) and T>(zi ) have 
the same states (i.e., [kzi- ij = [kzi\), then Algorithm [2] can come up with a solution x* for (P) 
assigning x{uj + y kz .j, Sj) = (zi — Zi-\)/ Pix> i _ 1 [5^] for every 1 < j < k and the value 0 to the other 
x variables (the values of the t variables are induced by the values of the x variables, and thus, we 
do not state their assignment). The solution x* is feasible since, for every 1 < j < k: 


S)] 


Pr^-xFf] ' = t 1 - C 1 - W] -[!-(!- V fc ) i_1 ] 

(i-i/*) 4 " 1 !!-(1-1/*)] = !/*■(!-**_!) . 


It can be checked that x* indeed leads Algorithm [2] to set T>i = T>(zi). To see that x* is optimal, 
notice that it adds elements only to the smaller sets (which results in a larger marginal gain by 
submodularity), and it adds every element to the maximal extent allowed by the first type of 
constraints. Finally, to see that x* is an extreme point solution notice that it is the only solution 
maximizing the objective function c • x, where c is a vector taking the value 1 exactly in the 
coordinates for which x* is non-zero. 

The second case we need to consider is when V(zi-i) and 'D(zi) have different states (i.e., 1 
+ \_kzi-i\ = \kzi\). In this case Algorithm [2] can come up with a solution x* for (P) assigning 
x ( u j+\kzi-i\, Sf) = 1 and x(u j+ i kZii ,Sf) = A: -1 ■- Zi_i - |fcz;-i_|)/Pr[P L ] for every 1 < j < k 

and the value 0 to the other x variables. The solution x* is feasible since, for every 1 < j < k: 


S )] = Pr P-i [Sj } + Pi•©(_! [Sj ] 


, L kzi-i - Zi _i - [kzi-i\ 


k-PTv^iSf] 


I, 1 | /1 , kZi-l - Zi-1 - [kZi-l\ . 

= [kZi-l + lj /k- Zi-i - - ---- = l /k ■ (1 - Zi-i) 


It can be checked that x* again leads Algorithm[2]to set T>, = T>(zi). To see that x* is optimal, notice 
that it adds as much as possible elements to the smaller sets (which results in a larger marginal 
gain by submodularity), and only the remaining capacity given by the first type of constraints is 
used to add elements to the larger sets. Finally, to see that x* is an extreme point solution notice 
that it is the only solution maximizing the objective function c • x, where c is a vector taking the 
value k + 1 in the coordinates for which x* is 1 and the value 1 in the other coordinates for which 
x* is non-zero. □ 


To complete the analysis of our bad instance, notice that O is a feasible solution and f(0) = 1. 
On the other hand, Lemma IB.2I shows that Algorithm [2] may terminate with a distribution over 
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subsets of Y. The value of / for any such set S is at most: 


m = g 



e -151 1 t 1 ^ 

+ ^k 1 - e + k =e + ° 
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